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In recent times, numerous digital image manipulation detection approaches 
have been proposed to detect which processing operations were applied to 
manipulate digital images. Most of these approaches consider the situation in 
which an image is manipulated by only one manipulation operation. 
However, practical image manipulation often involves multiple 
manipulation operations. It is important to detect multiple image 
manipulation operations and the order in which they were applied to 
establish the origin and genuineness of a given image as well as the 
processing history it has gone through. In this article, we proposed a new 
method to determine multiple image processing operation and operation 
chains based on convolutional neural network (CNN) and local optimal 
oriented pattern (LOOP). The proposed method is based on CNN and LOOP 
in which CNN extracts and learns image manipulation traces from the LOOP 


maps of the input images that are classified using softmax, extra-tree, and 
extreme gradient boosting (CKGBOOST) classifiers. Detailed experiments 
show that the proposed model can attain overall detection accuracies of 
99.81% and 99.15% in identifying different image manipulations and 
manipulation operation chains, respectively. 
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1. INTRODUCTION 

In recent years, digital images have become the principal means of conveying information due to 
their expressive capacities and ease of distribution [1]. As a mode of communication, digital images have 
shown to be more effective information carrier than text information. As a result, images constituting a key 
source of evidence in judicial proceedings and criminal investigations are becoming more widespread than ever. 

However, in most of these positive aspects where they are used, the nature of the digital image has 
created a lot of concerns. The information in digital images can be easily manipulated to convey fabricated or 
deceptive information. The introduction of advanced computers and user-friendly image manipulation tools 
allows anyone with basic image editing skills to alter them easily without leaving behind any human 
detectable traces [2]. As a result, image modifications for harmful reasons have become ubiquitous in our 
society, resulting in the spread of fake news, erroneous verdicts, and reputational damage among others [3]. 
Therefore, it is important to come up with techniques for verifying the authenticity of digital images before 
using them to make critical decisions. 

A variety of image manipulation detection approaches have been proposed [4]-[8]. However, most 
of the early approaches have focused on detecting specific image manipulations such as image splicing and 
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copy move. They cannot identify more than one type of image tampering. Thus, methods capable of detecting 
multiple image manipulation are required. 

In line with the foregoing, recent works in image manipulation detection have concentrated on 
building multi-purpose methods capable of detecting different types of image manipulation. Many multi-purpose 
image manipulation detection techniques based on handcrafted features have been presented. The work of [9] for 
example, proposed a universal image manipulation detection approach using residual-based features and the 
spatial rich model (SRM) [10] features used in steganalysis domain. In a similar study by Farooq et al. [11], 
investigated and demonstrated the performance of SRM and local binary pattern (LBP) [12] in identifying 
various image manipulation using synthesized datasets from the first Institute of Electrical and Electronics 
Engineers (IEEE) information security and technical committee (IFS-TC) image manipulation detection 
competition [13]. A recent study by Peng et al. [14] designed a universal feature set for multi-purpose image 
forensics that is capable of detecting different kinds of image manipulation operations concurrently using 
residual-based features. 

Following the success of deep learning techniques, especially convolutional neural network (CNN), 
in a range of visual identification studies [15]-[18], recent researches in image forensics also try to employ 
this approach in solving the problem of digital image manipulation. Among the deep learning-based methods, 
Singh and Goyal [19] designed a general-purpose image manipulation detection approach that uses deep 
learning technique. They proposed the use of residual dense blocks in building their model to improve 
classification by utilizing global residual learning and local dense connection. Wei ef al. [20], proposed a 
multi-purpose image tampering approach based on reinforcement learning that could design a deep neural 
network automatically without manual intervention. The method consists of a learning agent which is trained 
to choose layers of CNN automatically according to an algorithm known as the Q-learning algorithm [21]. 
Singhal et al. [22], described a multi-purpose image manipulation detection technique based on CNN and 
frequency domain features of image residuals. In another work by Bibi, and Abbasi [23], suggested an image 
manipulation detection approach based on CNN, auto-encoder, and ensemble subspace discriminant (ESD) 
learner. They utilized pre-trained networks, AlexNet and visual geometric group (VGG) for feature 
extraction. The extracted features were then forwarded to stacked auto-encoder for image manipulation 
detection and finally, the ESD learner is used for classification. 

Although the recent studies have enhanced the performance of image manipulation detection systems, 
the existing methods mainly focused on distinguishing different image manipulation operations. Thus, 
determining different image manipulation operations and the order in which these operations are applied 
remains an open challenge. Therefore, this work proposes a novel multipurpose CNN and local optimal 
oriented pattern (LOOP) based image manipulation and manipulation chain detection system that combines 
the strength of both the handcrafted and deep learning methods. To test the proposed method’s performance, 
we evaluated it on a variety of experiments and the results obtained show that the proposed method could 
detect image manipulation and manipulation chains with an average accuracies of 99.81% and 99.15%, 
respectively. 


2. METHOD 

This section gives an overview of the proposed approach for detecting image manipulation and 
manipulation operation chains as illustrated in Figure 1. The proposed method starts by transforming a colored 
input image to a grayscale image, which is then used to create the LOOP map of the input image. The designed 
CNN architecture then extracts and learns deep features from the input images’ LOOP maps, which will be 
forwarded to the dense layers for classification. To further examine the effectiveness of the proposed CNN in 
feature extraction, deep features extracted from the proposed CNN’s second dense layer were used to train 
and evaluate two other classifiers, extreme gradient boosting (KGBOOST) and extra tree classifier, on the 
detection and classification of image manipulations. In the subsections that follow, the key aspects of the 
proposed method are detailed. 


2.1. Datasets 
To train and assess the proposed model’s performance, we collected 9605 images from three 
publicly available image databases; the Microsoft common object in context (MS COCO) dataset [24], 
the Bossbase dataset [25] and, the IEEE image manipulation dataset [13]. Each of these datasets contributes 
3555, 1050, and 5000 images, respectively to form the 9605 composite image datasets. Finally, the total 
images (9605) was used to synthesize the training and testing datasets for the proposed model. 
To create the complete data for the proposed experiments, each of the 9605 images was pre-processed 
as follows. Each of these 9605 images was first centrally cropped to 256x256 and converted to grayscale 
images to represent unaltered images. Then each of the 9605 images was manipulated using the five 
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manipulation operations in Table 1, resulting in a total of 57630 (9605x6) images, 9605 original images, and 
their corresponding five tampered versions. The LOOP maps for each of the dataset’s six classes were then 
generated and utilized as input to the proposed model. Table 1 summarizes the types of manipulation 
operations and the parameters used for creating our experimental datasets 








RGB Gray Scale 
image image Š 
256 — RGB to Gray —————* Computing LOOP 
ee | Loop Map 
Feature extraction 
and learning by 
F proposed CNN a Deep Features 
A a 
Classification | * | Classification Classification 
by XGBOOST | by Softmax by Extra Tree — 
Classifier | Classifier Classifier 
XGBOOST Softmax | Extra Tree 
Output | Output | Output 





Figure 1. Illustration of the proposed method 


Table 1. Image manipulation operations and their parameters 








Operations Parameters 
Gaussian blurring (GB) Kernel size (Ksjze) = 3x3, 5x5 
Gamma correction (GC) Gama (y) = 0.5, 0.7, 0.8, 0.9 
Median filtering (MF) Kernel size (K;ize) = 3x3, 5x5 
Joint photographic experts group (JPEG) compression (JC) Quality factor (qF) = 80, 90 
Contrast enhancement (CE) Not applicable (n/a) 





2.2. LOOP computation 

We used the LOOP maps of the images as the input of the proposed model, which is a measure of 
local image features recently proposed by Chakraborti et al. [26] and mostly used in the description and 
classification of texture. LOOP is an enhancement of LBP and local directional pattern (LDP) [27] which have 
gained much applications in numerous computer vision researches including image manipulation detections 
[11], [28], [29]. It has the capability to capture hidden texture variation that may results due to image 
manipulation operations. This motivates us to combine it with CNN for image manipulation detection. 

Given an image I, the LOOP map can be computed using (1) and (2) [26]. Let i, denote the image 
Ps intensity at pixel (xc, y,), and let i, (n = 0,1...7) denote the intensity of a pixel in the 3x3 neighbors, 
ignoring the center pixel. The 8 Kirsch masks are oriented in the direction of the 8 neighboring pixels 
(n = 0,1,2... 7), providing a measure of the strength of intensity variation in those directions. Assume that 
M, denotes the responses of the eight Kirsch masks that correspond to pixels of varying intensities 
(n = 0,1,2,...7). Each of these pixels is given an exponential weight w, (a number between 0 and 7) 
based on the rank of the magnitude of the 8 Kirsch mask activations. The LOOP map is created by 
aggregating the occurrence of LOOP values across the entire image. 


Where: 
LOOP (xe, Ye) = En=o f Cin — ic) X 2"” (1) 
_ (0,if x20 
f(x) ™~ 1, otherwise (2) 


2.3. Convolutional neural network design 

We designed a convolutional neural network to extract and learn image manipulation traces directly 
from LOOP maps of the input image. The designed CNN is made up of 13 layers, with the first 10 being 
convolutional layers and the last three being the dense layers. The network is fed a 256x256 LOOP map, 
which is then used to extract low-level features using the convolutional filters. 

The final classification is performed by three different classifiers — softmax, extra tree and, 
XGBOOST. The features extracted and learned by the set of convolution layers of the proposed CNN are 
passed on to the dense layers with a softmax activation, which classifies an image as either manipulated or 
authentic. However, because other approaches for obtaining final classification performance may surpass the 
performance of a dense layer with a softmax layer, as described in [2], we proposed using the extra tree and 
XGBOOST classifiers to produce the proposed model’s final classification results as well. The types of layers, 
parameters and, output size of each layer of the proposed CNN architecture are summarized in Figure 2. 


Detection of image manipulation with convolutional neural network and ... (Ali Ahmad Aminu) 


632 o ISSN: 1693-6930 


64x(15x15) 
BN+ elu+ Pooling 
64x(30x30) t 


EENAA ELENN Class Probabilities 





64x(30x30) 
BN + elu “\\uesoosr 
64x(30x30) Softmax 
Convé6: 64x(3x3x64) xX t TEE 
64x(30x30)_ FC 2 + elu 
BENi elujpooling 1x300 vector 
64x(61x61) FC 1+ elu 
Conv5: 64x(3x3x64) t 128x{3x3) 
64x(61x61) t BN+ elu+ Pooling 
BNizely 128x(7x7) 
64x(61x61) Conv10:128(3x3x64) 
Conv4: 64x (3x3x64) Tt 64x(7x7) 
64x(61x61) t BN+ elu+ Pooling 
BN +elu ft 64x(15«15) 
Conv9: 64(3x3x64) 
64x(61x61) A 
Conv3: 64x(3x3x144) 64x(15x15) 
BN + elu 
144x(123x123) 
BN + elu t 64x(15x15) 





144x(123x123) Conv8:64x(3x3x64) 
Conv2: 144x(5x5x5) 


5x(250x250) 
Conv1: 5x (7x7x1) 


1x(256x256) t 
Input image 


Figure 2. Proposed CNN Architecture for Image manipulation detection 


2.4. Implementation details 

The proposed model was trained using the Adam optimizer with the default settings for the 
moments (81 = 0.9, B2 = 0.999, and £ = 1077), and a learning rate of 0.0001. The proposed model’s 
weights were initialized using the Xavier initialization approach, with their biases set to 0. The biases of the 
three dense units were set to 0.05, and their weights initialized using random numbers drawn from a zero- 
mean Gaussian distribution with a standard deviation of 0.01. For each training iteration, the proposed model 
was trained with a batch size of 16 images to minimize the categorical cross-entropy loss without shuffling of 
the training data across epochs. 

Suppose © is the parameter vector denoting the weight vector corresponding to the image 
manipulation detection task, the categorical cross-entropy loss can be formulated as: 


L@) = = Yim-1 Un-1 1(y™ = njlog(y™ = n|x™; 0) (3) 


Where M and N represent the total number of image samples and the number of classes, 1(.) denotes an 
indicator function which equals 1 if m = n, otherwise 0. y™ and x™ denote the image label and the feature 


of the sample m. 


3. RESULTS AND DISCUSSION 

We conducted several experiments under various experimental settings to evaluate the proposed 
image manipulation detection method. The model’s performance in detecting individual manipulation 
operations is evaluated in the first experiment. In the second experiment, all of the tampering operations 
specified in Table 1 are used to test the model’s ability to detect and classify multiple image tampering. 
Finally, we put the model to the test in terms of recognizing the image manipulation operation chain in 
compressed and uncompressed image formats. 


3.1. Individual image manipulation detection 

The proposed model is utilized in this experiment to determine if a given image is authentic or has 
been manipulated using any of the manipulation techniques given in Table 1. Five different experiments were 
performed with each corresponding to that of one manipulation technique. To carry out these experiment, we 
synthesized 5 manipulated versions of the authentic images as described in section 2.1 which resulted in 5 
classes of images, each corresponding to one manipulation operation given in Table 1 and having more than 


9000 images. 
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Using the implementation details described in section 2.4, the proposed binary classifier was trained 
and evaluated for 100 epochs with 14000 and 2000 subsets of the dataset respectively, and the remaining 
3000 was reserved for testing the model. To further evaluate the feature extraction capabilities of the 
proposed method, the features extracted by the proposed model was also used to train and assess two tree 
based models, XGBOOST and extra tree for the detection of individual image manipulation detection. 
The manipulation detection results obtained by the three models are presented in Table 2. 

The proposed CNN can detect the various types of image manipulations with an accuracy of at least 
99.10%, as shown in the table of results. The proposed model obtained the best detection rates of 100% in 
JPEG compression (JC) and Gaussian blurring (GB) image manipulations respectively. This indicate that the 
CNN and LOOP model could detect the different individual image manipulation operations. 

Additionally, we also observed that the performance of the extra tree based CNN when compared to 
that of the proposed CNN, led to some performance gain in the detection rates of each manipulation 
operation, with the exception of GB manipulation detection, where the proposed CNN got better detection 
accuracy. Similarly, the proposed CNN’s performance in gama correction (GC) and median filtering (MF) 
was also improved by the XGBOOST-based CNN classifier, with the exception of contrast enhancement 
(CE), where the proposed CNN outperformed the XGBOOST-based CNN. In general, the extra tree-based 
CNN outperformed the other two classifiers in recognizing individual image manipulation operations. 
The obtained results show that the proposed approach is robust at extracting the necessary image 
manipulation fingerprints required for identifying various types of image manipulation operations and its 
performance can be enhanced by using other classifiers. 


Table 2. Individual image manipulation detection accuracies (%) of the proposed CNN and tree based classifiers 
Classifiers/manipulation CE GB GC MF JC 








Softmax 99.10 100 99.90 99.64 100 
Extra tree 99.13 99.14 100 99.90 100 
XGBOOST 99.03 100 99.92 99.94 100 





3.2. Comparison with other methods 

We compare the proposed model’s individual image manipulation detection results with the results 
of [2], [9], [14] to further confirm the proposed model’s performance on identifying individual image 
manipulation. The proposed model’s accuracies on individual image manipulation in comparison to 
previously reported results are presented in Table 3. The highlighted numbers represent the best outcome for 
that manipulation operation. The “-” indication indicates that the given technique did not take that 
manipulation operation into account when evaluating their model. As shown in Table 3, our approach 
outperforms the methods of [2], [14] in GB, MF, and JC detections. Furthermore, the result of our method on 
GC revealed that it improved the works in [9] performance by 3.24%. The only exceptions were in CE and MF, 
in which the method of [9] outscores the proposed method by 0.08% and 0.05%, respectively. These results 
show that our approach has superior feature extraction and discriminating abilities, making it more resilient than 
existing methods in the majority of binary classification problems. 


Table 3. Comparison with similar approaches from the literature 
Methods/manipulations CE GB GC MF JC 








Proposed 99.13 100 100 99.94 100 
[2] - 99.95 - 99.71 99.66 
[9] 99.95 100 96.76 99.99 99.94 
[14] - 96.4 - 96.7 95.8 





3.3. Multiple image manipulation detection 

In this experiment, the proposed approach is utilized to detect different types of image 
manipulations. The training, validation, and test sets were all created in the same manner as stated in the 
preceding experiment. Table 4 and Figure 3 show the classification results of the three classifiers in terms of 
average detection accuracies and confusion matrices respectively. 

The experimental results indicate that our approach can detect multiple image manipulations with an 
average detection rate of 99.81% and each manipulation operation with an accuracy of at least 98.27% in 
Figure 3(a). The average detection rates of extra tree-based CNN and XGBOOST based CNN are also shown 
in Table 4. The extra tree-based CNN and XGBOOST based CNN could detect each of the manipulation 
operations with an accuracy of not less 97.80% in Figure 3(b) and 98.80% in Figure 3(c), respectively. When 
the results obtained are observed more closely, it can be seen that all the classifiers could detect the multiple 
image manipulation with a detection rate of not less than 99.43%. In contrast to the results in the individual 
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image manipulation detection, the proposed model obtained better results than the extra tree based and 
XGBOOST based CNNs in the multiple-image manipulation detection. This results further demonstrate the 
robustness of the proposed method in feature extraction and the discriminative powers of these features. 
Furthermore, the results also show that our approach can successfully subdue the impact of image contents, 
thus, allowing the proposed model to extract the necessary image manipulation fingerprints needed for image 
manipulation detection. 


Table 4. Multiple image tampering detection accuracies (%) of proposed CNN and tree based classifiers 











Methods Proposed CNN Extra-CNN XGBOOST-CNN 
Average accuracy 99.81 99.43 99.60 
100 100 100 
Confusion Matrix Confusion Matrix Confusion for xgboost 
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Figure 3. Comparing confusion matrices of (a) the proposed CNN, (b) extra tree-CNN, and 
(c) XGBOOST-CNN 


3.4. Manipulation operation chain detection 

We also test the effectiveness of the proposed model in identifying image manipulation operation 
chains in both uncompressed and compressed image formats. To conduct these experiments, each of the 
original images was manipulated by a chain of (A-B) manipulation operations. A-B indicates that the original 
image is manipulated first using “A” and then “B” manipulation operations. For instance, MF-GB will 
correspond to first applying median filtering operation followed by gaussian blurring operation. Applying the 
following image manipulation operation chains (CE-GB), (GB-CE), (GB-MF), (GC-GB), (GC-MF), and 
(MF-GB), we obtained 9605 images from the original image and each manipulation operation chain, 
resulting in a total of 67235 (9605x7) images. 

Additionally, we considered two different cases where a manipulation operation chain has been 
followed with and without a JPEG compression with a quality factor of 80. The LOOP maps of each class of 
the datasets was then obtained which served as the input of the model. The datasets was then divided at 
random to create the training, validation, and test sets, which were respectively used for training, validating 
and testing the proposed model. 

Table 5 depicts the average detection accuracies of the proposed models in detecting image 
manipulation operation chains in the uncompressed and compressed image format. The confusion matrices of 
the proposed method for identifying image manipulation operation chains in the uncompressed and 
compressed image manipulation chains are presented in Figure 4 and Figure 5 respectively. From the results, 
it can be observed that each of the three classifiers could detect all the manipulation operation chains with 
high detection rates in both cases. 

Additionally, looking at the diagonal elements of Figure 4, we can notice that the proposed CNN 
could detect each manipulation operation chain with at least 98.0% in Figure 4(a), in the uncompressed 
manipulation operation chain detection. Similarly, the extra tree and XGBOOST based CNNs were able to 
detect all the manipulation operation chains with at least 99.36% in Figure 4(b) and 99.40% in Figure 4(c) 
accuracies, respectively. The extra tree and XGBOOST based CNNs improved the performance of the 
proposed CNN by 1.36 % and 1.40%, respectively in the uncompressed manipulation operation chain detection. 

Moreover, in the compressed manipulation operation chains detection, the proposed model, extra 
tree-based CNN, and XGBOOST based CNN could detect each manipulation operation chain with at least 
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98.20% in Figure 5(a), 95.7% in Figure 5(b), and 86.5% in Figure 5(c) respectively as can be seen from the 
diagonal elements of Figure 5. When compared to their performances in manipulation chain detection in the 
uncompressed image format, the performance of the extra tree and XGBOOST based CNNs declined by 
3.86% and 12.9%, respectively, indicating the effect of applying JPEG compression after the manipulation 
operation chains. These results indicate that, in addition to being good at identifying multiple image 
manipulation operations, the proposed model can also detect numerous manipulation operation chains in 


images that have been subjected to JPEG compression with high detection accuracies, which is a more 
difficult and practical problem. 


Table 5. Manipulation chain operation detection accuracies (%) of proposed model and tree-based classifiers 
Image Type/methods Proposed CNN Extra tree-CNN XGBOOST-CNN 



















Uncompressed 99.15 99.57 99.51 
Compressed 99.18 98.92 97.20 
Confusion Matrix Confusion Matrix Confusion for xgboost 
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Figure 4. Comparing confusion matrices of the proposed CNN and tree based models for manipulation chain 
detection in uncompressed images (a) proposed CNN, (b) extra tree-CNN, and (c) XGBOOST-CNN 
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Figure 5. Comparing confusion matrices of the proposed CNN and tree based models for manipulation chain 
detection in compressed images (a) proposed CNN, (b) extra tree, and (c) XGBOOST 


4. CONCLUSION 

In this paper, we proposed a novel image manipulation detection approach that uses CNN and local 
feature descriptors to enhance image manipulation detection. Unlike prior methods that focused only on 
detecting different image tampering operations, the proposed method can detect both multiple manipulation 
operations as well as the order in which they were applied. We combine the strength of LOOP, a handcrafted 
feature, and CNN to build robust image manipulation detection methods. Our approach can extract image 
manipulation fingerprints directly from the LOOP maps of the input images, allowing it to extract the various 
fingerprints required to identify various image manipulation types. We evaluated the proposed model’s 
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performance in several experiments, and the results show that it was effective in identifying different image 
manipulation operations as well as the order in which these operations were applied. 
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